4.2. Segmentation with Python (I)

Code adapted from: https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing

Now we will do the segmentation but instead of using R, we will be using Python.

First we need to import some Python libraries.

import numpy as np # pip3 install numpy
import pandas as pd # pip3 install pandas
# pip3 install matplotlib
# pip3 install scipy
import skimage # pip3 install scikit-image
import os 
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, reconstruction, binary_closing
from skimage.measure import label,regionprops, perimeter
from skimage.morphology import binary_dilation, binary_opening
from skimage.filters import roberts, sobel
from skimage import measure, feature
from skimage.segmentation import clear_border
from skimage import data
from scipy import ndimage as ndi
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import dicom # pip3 install dicom
import scipy.misc
import pydicom # pip3 install pydicom
import matplotlib.pyplot as plt

Each scan consist in multiple slices. We have all the DICOM images from the scan in one folder. In path_images we indicate the path of the folder.

from subprocess import check_output
path_images = "/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/"
# You can check that everything is loading correctly with: print(check_output(["ls", path_images]).decode("utf8"))

To read the images, we will use the function pydicom.read_file(). Then we will update the intensity values of -2000 with 0. These pixels are the ones that are located outside the scanner bounds.

# pip3 install nltk==3.6.2
lung = pydicom.read_file("/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/20.dcm")
slice = lung.pixel_array
plt.axis('off')
slice[slice == -2000] = 0
plt.imshow(slice, cmap=plt.cm.gray)
plt.show()

We create a function file_is_hidden() to read only .dcm files and not hidden files in the folder that we cannot see in our computers.

if os.name == 'nt':
    import win32api, win32con
def file_is_hidden(p):
    if os.name== 'nt':
        attribute = win32api.GetFileAttributes(p)
        return attribute & (win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_SYSTEM)
    else:
        return p.startswith('.') #linux-osx
file_list = [f for f in os.listdir(path_images) if not file_is_hidden(f)] 

Now we will read all the images from a folder with a function named read_ct_scan(folder_name).

def read_ct_scan(folder_name):
  # Read the slices from the dicom file
  slices = [pydicom.read_file(folder_name + filename) for filename in os.listdir(folder_name) if not file_is_hidden(filename)]
  # Sort the dicom slices in their respective order
  slices.sort(key=lambda x: int(x.InstanceNumber))
  # Get the pixel values for all the slices
  slices = np.stack([s.pixel_array for s in slices])
  slices[slices == -2000] = 0
  return slices
  
ct_scan = read_ct_scan(path_images) 

Plot some of the images from a folder.

def plot_ct_scan(scan):
    f, plots = plt.subplots(int(scan.shape[0] / 20) + 1, 4, figsize=(25, 25))
    for i in range(0, scan.shape[0], 5):
        plots[int(i / 20), int((i % 20) / 5)].axis('off')
        plots[int(i / 20), int((i % 20) / 5)].imshow(scan[i], cmap=plt.cm.gray) 

plot_ct_scan(ct_scan)
plt.show()

Now we will do the lung segmentation of 1 image.

def get_segmented_lungs2(im, plot=False):
    if plot == True:
        f, plots = plt.subplots()
    # Step 1: Convert into a binary image. 
    binary = im < 604
    # Step 2: Remove the blobs connected to the border of the image.
    cleared = clear_border(binary)
    # Step 3: Label the image.
    label_image = label(cleared)
    # Step 4: Keep the labels with 2 largest areas.
    areas = [r.area for r in regionprops(label_image)]
    areas.sort()
    if len(areas) > 2:
        for region in regionprops(label_image):
            if region.area < areas[-2]:
                for coordinates in region.coords:                
                       label_image[coordinates[0], coordinates[1]] = 0
    binary = label_image > 0
    # Step 5: Erosion operation with a disk of radius 2. This operation is seperate the lung nodules attached to the blood vessels.
    selem = disk(2)
    binary = binary_erosion(binary, selem)
    # Step 6: Closure operation with a disk of radius 10. This operation is to keep nodules attached to the lung wall.
    selem = disk(10)
    binary = binary_closing(binary, selem)
    # Step 7: Fill in the small holes inside the binary mask of lungs.
    edges = roberts(binary)
    binary = ndi.binary_fill_holes(edges)
    if plot == True:
        plots.axis('off')
        plots.imshow(binary, cmap=plt.cm.bone) 
    # Step 8: Superimpose the binary mask on the input image.
    get_high_vals = binary == 0
    im[get_high_vals] = 0
    return im

Image

Mask of the image

get_segmented_lungs2(ct_scan[20], True)
plt.show()

The steps to obtain the segmentation are the following.

Now we will briefly explain the process of radiomics.

  1. First we convert de image into a binary image. This means that every pixel of the image has only one of two possible values. One black and one white.
  2. We remove the spots connected to the edge of the image.
  3. Create a label.
  4. Keep the labels with two largest areas.
  5. Seperate the lung nodules attached to the blood vessels.
  6. Keep nodules attached to the lung wall.
  7. Fill in the small holes inside the binary mask of lungs.
  8. Superimpose the binary mask on the input image.

We can do the segmentation of all the slices of the scan.

def segment_lung_from_ct_scan(ct_scan):
    return np.asarray([get_segmented_lungs2(slice) for slice in ct_scan])
segmented_ct_scan2 = segment_lung_from_ct_scan(ct_scan)
plot_ct_scan(segmented_ct_scan2)
plt.show()


4.3. Segmentation with Python (II)

Code adapted from: https://www.raddq.com/dicom-processing-segmentation-visualization-in-python/

We will study another way to perform segmentation of lung scans using Python.

As before, first we need to import some Python libraries.

import numpy as np
import dicom
import os
import matplotlib.pyplot as plt
from glob import glob
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import scipy.ndimage
from skimage import morphology
from skimage import measure
from skimage.transform import resize
# pip3 install scikit-learn
from sklearn.cluster import KMeans
# pip3 install plotly
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.tools import FigureFactory as FF
from plotly.graph_objs import *
import pydicom

Specify the path where the folder with all the images is, and the path where we want our mask images saved.

data_path = "/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/"
output_path = working_path = "/Users/andrealetaalfonso/Desktop/TFG/"

Read all the DICOM files. To do so, we use glob, which is used to return all file paths that match a specific pattern.

g = glob(data_path + '/*.dcm')

Print out the first 5 file names to verify we are in the right folder.

print("Total of %d DICOM images.\nFirst 5 filenames:" % len(g))
## Total of 62 DICOM images.
## First 5 filenames:
print('\n'.join(g[:5]))
## /Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/16.dcm
## /Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/17.dcm
## /Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/15.dcm
## /Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/29.dcm
## /Users/andrealetaalfonso/Desktop/TFG/images/Kaggle/ID00184637202242062969203/28.dcm

Now we loop over the image files and store everything into a list.

def load_scan(path):
    slices = [pydicom.read_file(path + '/' + s) for s in os.listdir(path) if not file_is_hidden(s)]
    slices.sort(key = lambda x: int(x.InstanceNumber))
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
    for s in slices:
        s.SliceThickness = slice_thickness
    return slices

The voxel values in the images are raw. This function converts raw values into Houndsfeld units (a quantitative scale for describing radiodensity).

def get_pixels_hu(scans):
    image = np.stack([s.pixel_array for s in scans])
    # Convert to int16 (from sometimes int16), should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)
    # Set outside-of-scan pixels to 1. The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    # Convert to Hounsfield units (HU)
    intercept = scans[0].RescaleIntercept
    slope = scans[0].RescaleSlope
    if slope != 1:
        image = slope * image.astype(np.float64)
        image = image.astype(np.int16)
    image += np.int16(intercept)
    return np.array(image, dtype=np.int16)
id = 0
patient = load_scan(data_path) 
imgs = get_pixels_hu(patient) 

Save in .npy format

np.save(output_path + "fullimages_%d.npy" % (id), imgs)

Now we will check if the Houndsfeld Units (HU) are properly scaled and represented. HU are very important because they are standardized across all CT scans. To give a few examples, air is −1000, lung −500, fat −100 to −50, water 0, blood +30 to +70, muscle +10 to +40 and so on.

Let’s display an histogram to check HU.

file_used=output_path+"fullimages_%d.npy" % id
imgs_to_process = np.load(file_used).astype(np.float64) 
plt.hist(imgs_to_process.flatten(), bins=50, color='c')
plt.xlabel("Hounsfield Units (HU)")
plt.ylabel("Frequency")
plt.show()

With the histogram, we can see that there is a lot of air and water. An abundance of lung, fat, bood and muscle.

We obtain the conclusions that we will need to do preprocessing to study well the lungs only, because there is a lot of air and water.

Now we display some of the images, we will be skipping every two slices.

id = 0
imgs_to_process = np.load(output_path+'fullimages_{}.npy'.format(id))

def sample_stack(stack, rows=4, cols=4, start_with=5, show_every=2):
    fig,ax = plt.subplots(rows,cols,figsize=[12,12])
    for i in range(rows*cols):
        ind = start_with + i*show_every
        ax[int(i/rows),int(i % rows)].set_title('slice %d' % ind)
        ax[int(i/rows),int(i % rows)].imshow(stack[ind],cmap='gray')
        ax[int(i/rows),int(i % rows)].axis('off')
    plt.show()

sample_stack(imgs_to_process)

We can see a lot of gray that represents air.

Let’s see how thick each slice is.

print("Slice Thickness: %f" % patient[0].SliceThickness)
## Slice Thickness: 5.000000
print("Pixel Spacing (row, col): (%f, %f) " % (patient[0].PixelSpacing[0], patient[0].PixelSpacing[1]))
## Pixel Spacing (row, col): (0.722656, 0.722656)

Now we do resampling. We want that each slice is resampled in 1x1x1 mm pixels and slices, because it will be useful to obtain the 3d image.

id = 0
imgs_to_process = np.load(output_path+'fullimages_{}.npy'.format(id))
def resample(image, scan, new_spacing=[1,1,1]):
    # Determine current pixel spacing
    spacing = map(float, ([scan[0].SliceThickness] + list(scan[0].PixelSpacing)))
    spacing = np.array(list(spacing))
    resize_factor = spacing / new_spacing
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape / image.shape
    new_spacing = spacing / real_resize_factor
    image = scipy.ndimage.interpolation.zoom(image, real_resize_factor)
    return image, new_spacing
print ("Shape before resampling\t", imgs_to_process.shape)
## Shape before resampling   (62, 512, 512)
imgs_after_resamp, spacing = resample(imgs_to_process, patient, [1,1,1])
print ("Shape after resampling\t", imgs_after_resamp.shape)
## Shape after resampling    (310, 370, 370)

3D Plotting

def make_mesh(image, threshold=-300, step_size=1):
    p = image.transpose(2,1,0)
    verts, faces, norm, val = measure.marching_cubes(p, threshold, step_size=step_size, allow_degenerate=True) 
    return verts, faces
def plt_3d(verts, faces):
    x,y,z = zip(*verts) 
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='3d')
    # Fancy indexing: `verts[faces]` to generate a collection of triangles
    mesh = Poly3DCollection(verts[faces], linewidths=0.05, alpha=1)
    face_color = [1, 1, 0.9]
    mesh.set_facecolor(face_color)
    ax.add_collection3d(mesh)
    ax.set_xlim(0, max(x))
    ax.set_ylim(0, max(y))
    ax.set_zlim(0, max(z))
    ax.set_facecolor((0.7, 0.7, 0.7)) # canvio depricated version!
    plt.show()
v, f = make_mesh(imgs_after_resamp, 350)
plt_3d(v, f)

Segmentation

Now we will do the segmentation of the lungs.

# Standardize the pixel values
def make_lungmask(img, display=False):
    row_size= img.shape[0]
    col_size = img.shape[1]
    mean = np.mean(img)
    std = np.std(img)
    img = img-mean
    img = img/std
    # Find the average pixel value near the lungs to renormalize washed out images
    middle = img[int(col_size/5):int(col_size/5*4),int(row_size/5):int(row_size/5*4)] 
    mean = np.mean(middle)  
    max = np.max(img)
    min = np.min(img)
    # To improve threshold finding, I'm moving the underflow and overflow on the pixel spectrum
    img[img==max]=mean
    img[img==min]=mean
    # Using Kmeans to separate foreground (soft tissue / bone) and background (lung/air)
    kmeans = KMeans(n_clusters=2).fit(np.reshape(middle,[np.prod(middle.shape),1]))
    centers = sorted(kmeans.cluster_centers_.flatten())
    threshold = np.mean(centers)
    thresh_img = np.where(img<threshold,1.0,0.0)  # threshold the image
    # First erode away the finer elements, then dilate to include some of the pixels surrounding the lung.  
    # We don't want to accidentally clip the lung.
    eroded = morphology.erosion(thresh_img,np.ones([3,3]))
    dilation = morphology.dilation(eroded,np.ones([8,8]))
    labels = measure.label(dilation) # Different labels are displayed in different colors
    label_vals = np.unique(labels)
    regions = measure.regionprops(labels)
    good_labels = []
    for prop in regions:
        B = prop.bbox
        if B[2]-B[0]<row_size/10*9 and B[3]-B[1]<col_size/10*9 and B[0]>row_size/5 and B[2]<col_size/5*4:
            good_labels.append(prop.label)
    mask = np.ndarray([row_size,col_size],dtype=np.int8)
    mask[:] = 0
    # After just the lungs are left, we do another large dilation in order to fill in and out the lung mask 
    for N in good_labels:
        mask = mask + np.where(labels==N,1,0)
    mask = morphology.dilation(mask,np.ones([10,10])) # one last dilation
    if (display):
        fig, ax = plt.subplots(3, 2, figsize=[12, 12])
        ax[0, 0].set_title("Original")
        ax[0, 0].imshow(img, cmap='gray')
        ax[0, 0].axis('off')
        ax[0, 1].set_title("Threshold")
        ax[0, 1].imshow(thresh_img, cmap='gray')
        ax[0, 1].axis('off')
        ax[1, 0].set_title("After Erosion and Dilation")
        ax[1, 0].imshow(dilation, cmap='gray')
        ax[1, 0].axis('off')
        ax[1, 1].set_title("Color Labels")
        ax[1, 1].imshow(labels)
        ax[1, 1].axis('off')
        ax[2, 0].set_title("Final Mask")
        ax[2, 0].imshow(mask, cmap='gray')
        ax[2, 0].axis('off')
        ax[2, 1].set_title("Apply Mask on Original")
        ax[2, 1].imshow(mask*img, cmap='gray')
        ax[2, 1].axis('off')
        plt.show()
    return mask*img

We know obtain the segmentation to only one image.

img = imgs_after_resamp[63]
make_lungmask(img, display=True)

Anf finally we apply it to all the slices.

masked_lung = []
for img in imgs_after_resamp:
    masked_lung.append(make_lungmask(img))
sample_stack(masked_lung, show_every=10)

Save masks in .npy format.

np.save(output_path + "maskedimages_%d.npy" % (id), imgs)


4.4. Segmentation with lungmask

And finally, we can also do segmentation with lungmask.

First we import the necessary Python libraries.

from lungmask import mask
import SimpleITK as sitk
import os
import pydicom
from pydicom.data import get_testdata_files
import numpy as np
import matplotlib.pyplot as plt
import cv2
import time

Function to read one image from a specified path.

def get_img_sitk(path):
    return sitk.ReadImage(path)

Function to read multiple images from a specified path.

def get_series_sitk(path):
    reader = sitk.ImageSeriesReader()
    reader.MetaDataDictionaryArrayUpdateOn()
    reader.LoadPrivateTagsOn()
    dicom_names = reader.GetGDCMSeriesFileNames(path)
    reader.SetFileNames(dicom_names)
    return reader.Execute(), reader

Function to generate a mask.

def generate_mask(img):
  segmentation = mask.apply(img)
  segmentation[segmentation > 0] = 1
  if img.GetSize()[2] > 1:
    masked_img = np.zeros((img.GetSize()[2], img.GetSize()[0], img.GetSize()[1]))
    for i in range(1, segmentation.__len__()):
      masked_img[i,:,:] = np.where(segmentation[i,:,:] == 1, sitk.GetArrayFromImage(img)[i,:,:], 0)
  else:
    masked_img = np.where(segmentation == 1, sitk.GetArrayFromImage(img)[0,:,:], 0)
    masked_img = masked_img[0,:,:]
    segmentation = segmentation[0,:,:]
  return segmentation, masked_img

To save the images.

def create_writer(series_reader):
    writer = sitk.ImageFileWriter()
    writer.KeepOriginalImageUIDOn()
    tags_to_copy = ["0010|0010",  # Patient Name
                    "0010|0020",  # Patient ID
                    "0010|0030",  # Patient Birth Date
                    "0020|000D",  # Study Instance UID, for machine consumption
                    "0020|0010",  # Study ID, for human consumption
                    "0008|0020",  # Study Date
                    "0008|0030",  # Study Time
                    "0008|0050",  # Accession Number
                    "0008|0060"  # Modality
                    ]
    modification_time = time.strftime("%H%M%S")
    modification_date = time.strftime("%Y%m%d")
    
    series_tag_values = [
                        (k, series_reader.GetMetaData(0, k))
                        for k in tags_to_copy
                        if series_reader.HasMetaDataKey(0, k)] + \
                    [("0008|0031", modification_time),  # Series Time
                     ("0008|0021", modification_date),  # Series Date
                     ("0008|0008", "DERIVED\\SECONDARY"),  # Image Type
                     ("0020|000e", "1.2.826.0.1.3680043.2.1125." +
                      modification_date + ".1" + modification_time),
                     ("0008|103e","Processed-SimpleITK")]  # Series Description
    return writer, series_tag_values

To obtain and save the masks in DICOM format.

def get_and_save_masks(series_path):
    # Get series of dicom images
    series, series_reader = get_series_sitk(series_path)
    
    # Get masked lungs and mask filters
    mask_filter, masked_lung = generate_mask(series)
    
    # Save masks
    writer, series_tag_values = create_writer(series_reader)
    
    for id in range(len(mask_filter)):
        image_slice = sitk.GetImageFromArray(mask_filter[id,:,:])
        # Tags shared by the series.
        for tag, value in series_tag_values:
            image_slice.SetMetaData(tag, value)
        # Slice specific tags.
        #   Instance Creation Date
        image_slice.SetMetaData("0008|0012", time.strftime("%Y%m%d"))
        #   Instance Creation Time
        image_slice.SetMetaData("0008|0013", time.strftime("%H%M%S"))
        #   Instace Number
        image_slice.SetMetaData("0020|0013", str(id))
        # Check if new directory exists
        if not os.path.exists(series_path + "_mask/"):
          os.makedirs(series_path + "_mask/")
        # Write to the output directory and add the extension dcm, to force writing
        # in DICOM format.
        writer.SetFileName(series_path + "_mask/" + str(id) + ".dcm")
        writer.Execute(image_slice)

Plot one image.

ds = pydicom.dcmread("/Users/andrealetaalfonso/Desktop/TFG/DICOMS/id_1/std_38.dcm")
plt.imshow(ds.pixel_array, cmap='gray')
plt.show()

Now we create a mask for the image above.

image1 <- get_img_sitk("/Users/andrealetaalfonso/Desktop/TFG/DICOMS/id_1/std_38.dcm")
mask_filter_single, masked_lung_single = generate_mask(image1)
plt.imshow(mask_filter_single, cmap='gray')
plt.show()
plt.imshow(masked_lung_single, cmap='gray')
plt.show()

We can visualize it all in one panel.

fig, ax = plt.subplots(1,3)
ax[0].set_title("Original", fontsize='small')
ax[0].imshow(ds.pixel_array, cmap='gray')
ax[0].axis('off')

ax[1].set_title("Segmented", fontsize='small')
ax[1].imshow(mask_filter_single, cmap='gray')
ax[1].axis('off')

ax[2].set_title("Masked", fontsize='small')
ax[2].imshow(masked_lung_single, cmap='gray')
ax[2].axis('off')

plt.show()

Now, we can do the same for all of the images in the folder.

We generate the masks and save them in the path + _mask.

get_and_save_masks("/Users/andrealetaalfonso/Desktop/TFG/DICOMS/id_1")


5. Radiomic features with masks

Now that we have in one folder all the images, and in another one all the masks, we can start computing the radiomic features of the images with the masks.

We will be using lungmask.

Load the images and the masks with the funcion load_dicom().

library(RIA)
images = load_dicom(filename = "/Users/andrealetaalfonso/Desktop/TFG/DICOMS/id_1", 
                    mask_filename = "/Users/andrealetaalfonso/Desktop/TFG/DICOMS/id_1_mask")
## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.

Compute the radiomic features of the images.

images = first_order(RIA_data_in = images)
results <- RIA:::list_to_df(images$stat_fo$orig)
name_characteristics <- rownames(results)
library(rmarkdown)
paged_table(results)

Multiple patients at the same time

Now we will do the same but for 5 patients (in our case) at the same time.

First obtain and save the mask of the images.

# List of all the patient folders
patients = [f for f in os.listdir("/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle") if not f.startswith('.')]

for i in patients: # for loop through all the patients
    path = os.path.join("/Users", "andrealetaalfonso", "Desktop","TFG", "images", "Kaggle", i)
    get_and_save_masks(path) # obtain and save masks

Now we compute the radiomic features.

library(RIA)
library(rmarkdown)

res <- c()

images_folder_names <- grep(list.files(path="/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle"), pattern='mask', invert=TRUE, value=TRUE)
mask_folder_names <- list.files("/Users/andrealetaalfonso/Desktop/TFG/images/Kaggle", pattern="mask")

for(patient in 1:length(images_folder_names)){
  images_folder_path <- file.path("/Users", "andrealetaalfonso", "Desktop","TFG", "images", "Kaggle", images_folder_names[patient], fsep="/")
  mask_folder_path <- file.path("/Users", "andrealetaalfonso", "Desktop","TFG", "images", "Kaggle", mask_folder_names[patient ], fsep="/")
  
  images = load_dicom(filename = images_folder_path, mask_filename = mask_folder_path)
  images = first_order(RIA_data_in = images)
  res[patient] <- RIA:::list_to_df(images$stat_fo$orig)
}
## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.

## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.

## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.

## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.

## Warning in oro.dicom::create3D(dcmImages_mask, mode = mode_in, transpose =
## transpose_in, : ImagePositionPatient is moving in more than one dimension.
results <- do.call(cbind.data.frame, res)
colnames(results) <- seq(1, length(images_folder_names))
rownames(results) <- name_characteristics
paged_table(results)
results <- do.call(cbind.data.frame, res)
colnames(results) <- seq(1, length(images_folder_names))
rownames(results) <- name_characteristics
paged_table(results)


Information about the patients

We have a .csv file with information about the patients.

Let’s open the file and view some of the rows.

data_patients <- read.csv('/Users/andrealetaalfonso/Desktop/TFG/images/train.csv')
data_patients <- as.data.frame(data_patients)
head(data_patients)
##                     Patient Weeks  FVC  Percent Age  Sex SmokingStatus
## 1 ID00007637202177411956430    -4 2315 58.25365  79 Male     Ex-smoker
## 2 ID00007637202177411956430     5 2214 55.71213  79 Male     Ex-smoker
## 3 ID00007637202177411956430     7 2061 51.86210  79 Male     Ex-smoker
## 4 ID00007637202177411956430     9 2144 53.95068  79 Male     Ex-smoker
## 5 ID00007637202177411956430    11 2069 52.06341  79 Male     Ex-smoker
## 6 ID00007637202177411956430    17 2101 52.86865  79 Male     Ex-smoker

The data we are provided with means the following:

We will only select the information about the patients that we have.

data_patients[data_patients$Patient %in% images_folder_names,]
##                       Patient Weeks  FVC   Percent Age    Sex SmokingStatus
## 1   ID00007637202177411956430    -4 2315  58.25365  79   Male     Ex-smoker
## 2   ID00007637202177411956430     5 2214  55.71213  79   Male     Ex-smoker
## 3   ID00007637202177411956430     7 2061  51.86210  79   Male     Ex-smoker
## 4   ID00007637202177411956430     9 2144  53.95068  79   Male     Ex-smoker
## 5   ID00007637202177411956430    11 2069  52.06341  79   Male     Ex-smoker
## 6   ID00007637202177411956430    17 2101  52.86865  79   Male     Ex-smoker
## 7   ID00007637202177411956430    29 2000  50.32713  79   Male     Ex-smoker
## 8   ID00007637202177411956430    41 2064  51.93759  79   Male     Ex-smoker
## 9   ID00007637202177411956430    57 2057  51.76145  79   Male     Ex-smoker
## 63  ID00019637202178323708467    13 2100  92.85872  83 Female     Ex-smoker
## 64  ID00019637202178323708467    14 2047  90.51514  83 Female     Ex-smoker
## 65  ID00019637202178323708467    16 2001  88.48110  83 Female     Ex-smoker
## 66  ID00019637202178323708467    18 2054  90.82467  83 Female     Ex-smoker
## 67  ID00019637202178323708467    20 2002  88.52532  83 Female     Ex-smoker
## 68  ID00019637202178323708467    26 2116  93.56622  83 Female     Ex-smoker
## 69  ID00019637202178323708467    38 1946  86.04908  83 Female     Ex-smoker
## 70  ID00019637202178323708467    53 1808  79.94694  83 Female     Ex-smoker
## 71  ID00019637202178323708467    66 1778  78.62038  83 Female     Ex-smoker
## 72  ID00020637202178344345685    18 2297 117.77071  66 Female  Never smoked
## 73  ID00020637202178344345685    19 2145 109.97744  66 Female  Never smoked
## 74  ID00020637202178344345685    21 2245 115.10459  66 Female  Never smoked
## 75  ID00020637202178344345685    23 2250 115.36095  66 Female  Never smoked
## 76  ID00020637202178344345685    25 2096 107.46514  66 Female  Never smoked
## 77  ID00020637202178344345685    31 2115 108.43929  66 Female  Never smoked
## 78  ID00020637202178344345685    44 2058 105.51682  66 Female  Never smoked
## 79  ID00020637202178344345685    54 1940  99.46678  66 Female  Never smoked
## 80  ID00020637202178344345685    70 1861  95.41632  66 Female  Never smoked
## 81  ID00023637202179104603099    -3 1536  65.30612  71 Female     Ex-smoker
## 82  ID00023637202179104603099     3 1368  58.16327  71 Female     Ex-smoker
## 83  ID00023637202179104603099     5 1361  57.86565  71 Female     Ex-smoker
## 84  ID00023637202179104603099     7 1465  62.28741  71 Female     Ex-smoker
## 85  ID00023637202179104603099     9 1681  71.47109  71 Female     Ex-smoker
## 86  ID00023637202179104603099    15 1461  62.11735  71 Female     Ex-smoker
## 87  ID00023637202179104603099    27 1337  56.84524  71 Female     Ex-smoker
## 88  ID00023637202179104603099    39 1304  55.44218  71 Female     Ex-smoker
## 89  ID00023637202179104603099    55 1406  59.77891  71 Female     Ex-smoker
## 699 ID00184637202242062969203    29 3130  67.31183  52   Male     Ex-smoker
## 700 ID00184637202242062969203    30 2965  63.76344  52   Male     Ex-smoker
## 701 ID00184637202242062969203    32 3197  68.75269  52   Male     Ex-smoker
## 702 ID00184637202242062969203    34 2761  59.37634  52   Male     Ex-smoker
## 703 ID00184637202242062969203    37 3009  64.70968  52   Male     Ex-smoker
## 704 ID00184637202242062969203    43 3175  68.27957  52   Male     Ex-smoker
## 705 ID00184637202242062969203    55 3021  64.96774  52   Male     Ex-smoker
## 706 ID00184637202242062969203    67 2991  64.32258  52   Male     Ex-smoker
## 707 ID00184637202242062969203    83 2743  58.98925  52   Male     Ex-smoker


Bibliography

Kumar et al. (2012)

Mackin et al. (2015)

Kolossváry et al. (2018)

Mayerhoefer et al. (2020)

Timmeren et al. (2020)

Rizzo et al. (2018)

Hofmanninger et al. (2020)

Hofmanninger, J., Prayer, F., Pan, J., Röhrich, S., Prosch, H., & Langs, G. (2020). Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem. European Radiology Experimental, 4(1), 1–13.
Kolossváry, M., Kellermayer, M., Merkely, B., & Maurovich-Horvat, P. (2018). Cardiac computed tomography radiomics. Journal of Thoracic Imaging, 33(1), 26–34.
Kumar, V., Gu, Y., Basu, S., Berglund, A., Eschrich, S. A., Schabath, M. B., Forster, K., Aerts, H. J. W. L., Dekker, A., Fenstermacher, D., Goldgof, D. B., Hall, L. O., Lambin, P., Balagurunathan, Y., Gatenby, R. A., & Gillies, R. J. (2012). Radiomics: The process and the challenges. Magnetic Resonance Imaging, 30(9), 1234–1248. https://doi.org/https://doi.org/10.1016/j.mri.2012.06.010
Mackin, D., Fave, X., Zhang, L., Fried, D., Yang, J., Taylor, B., Rodriguez-Rivera, E., Dodge, C., Jones, A. K., & Court, L. (2015). Measuring CT scanner variability of radiomics features. Investigative Radiology, 50(11), 757.
Mayerhoefer, M. E., Materka, A., Langs, G., Häggström, I., Szczypiński, P., Gibbs, P., & Cook, G. (2020). Introduction to radiomics. Journal of Nuclear Medicine, 61(4), 488–495.
Rizzo, S., Botta, F., Raimondi, S., Origgi, D., Fanciullo, C., Morganti, A. G., & Bellomi, M. (2018). Radiomics: The facts and the challenges of image analysis. European Radiology Experimental, 2(1), 1–8.
Timmeren, J. E. van, Cester, D., Tanadini-Lang, S., Alkadhi, H., & Baessler, B. (2020). Radiomics in medical imaging—“how-to” guide and critical reflection. Insights into Imaging, 11(1), 1–16.